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Recent research has explored the increasingly important role of social media by examining the 
dynamics of individual and group behavior, characterizing patterns of information diffusion, and 
identifying influential individuals. In this paper we suggest a measure of causal relationships between 
nodes based on the information-theoretic notion of transfer entropy, or information transfer. This 
theoretically grounded measure is based on dynamic information, captures fine-grain notions of 
influence, and admits a natural, predictive interpretation. Causal networks inferred by transfer 
entropy can differ significantly from static friendship networks because most friendship links are not 
useful for predicting future dynamics. We demonstrate through analysis of synthetic and real- world 
data that transfer entropy reveals meaningful hidden network structures. In addition to altering our 
notion of who is influential, transfer entropy allows us to differentiate between weak influence over 
large groups and strong influence over small groups. 



I. INTRODUCTION 

Recent years have witnessed an explosive growth of 
various social media sites such as online social networks, 
discussion forums and message boards, and inter-linked 
blogs. For researchers, social media serves as a fertile 
ground for examining social interactions on an unprece- 
dented scale [4]. One important problem is the charac- 
terization and identification of influentials^ which can be 
defined as users who influence the behavior of large num- 
bers of other users. Recent work on influence propaga- 
tion has used numerous characterizations of influentials 
based on topological centrality measures such as Pager- 
ank score [8l[TT]. To characterize influence in Twitter, re- 
searchers have suggested number of followers, mentions, 
and retweets f5^, and Pagerank of follower network [9]. 
It has been observed, however, that the purely structural 
measures of influence can be misleading [6 and high pop- 
ularity does not necessarily imply high influence [14[ [16] . 
More recent work has used the size of the cascade trees [l] 
and influence-passivity score [14]. One serious drawback 
of existing methods is that they are based on explicit 
causal knowledge (i.e., A responds to B), whereas for 
many data sets such knowledge is not available and needs 
to be discovered. 

Here we suggest a model-free approach to uncover- 
ing causal relationships and identifying influential users 
based on their capacity to predict the behavior of other 
users, through the information-theoretic notion of trans- 
fer entropy^ interchangeably referred to as information 
transfer. In a nutshell, transfer entropy between two 
stochastic processes characterizes the reduction of un- 
certainty in one process due to the knowledge of the 
other process; a mathematical definition is given below. 
Transfer entropy can be thought of as a nonlinear gen- 
eralization of Granger causality [3 , and has been used 
extensively in computational neuroscience, e.g., for ex- 



amining causal relationships in cortical neurons [7j. In 
contrast to other correlation measures such as mutual 
information, transfer entropy is asymmetric and allows 
differentiation in the direction of information flow. Fur- 
thermore, whereas most existing studies are concerned 
with aggregate measures of influence, the approach out- 
lined here allows more fine-grained analysis of informa- 
tion diffusion by analyzing information transfer on each 
existing link in the network. Finally, our approach is 
model- free. Information-theoretic measures allow us to 
statistically characterize our uncertainty without making 
assumptions about human behavior. 

The rest of this paper is organized as follows. We 
begin by describing the basic intuition and mathemat- 
ics behind the information transfer, and briefly mention 
computational issues of the approach. In Section [IIIA 



we 



present results of our simulation with synthetically gen- 
erated data, where we thoroughly examine how the infor- 
mation transfer depends on various characteristics of the 



data generating process. In Section |IIIB| we present our 
results on real-world data extracted from user activities 
on Twitter. We conclude the paper by discussing results 
and some future work in Section HVl 



II. TRANSFER ENTROPY 

A. Notation 

For each user, X, we record the history of activity, 
e.g., timing of tweets, as a sequence of times as Sx = 
{tj : < ti < ^2 . . .}. In general, we assume each user's 
activity is described by some stochastic point process. 
We are limited by finite data to consider finite temporal 
resolution, so we introduce the binned random variable. 
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If we observe the actions of a user for some long period 
of time T, we can define probabilities over these coarse- 
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grained variables. Fix 5 e 7Z^ then 

P{Bx{t,t- 6) = Xt) = j\t[B,{t,t- S) = Xt] 

Similarly, we could define a joint probability distribution 
over a sequence of adjacent bins, 

P{Bx (t, t-So) = XuBx{t-6o,t-6o-Si) = , . . .), 

with widths (5o, . . . , (5/^ G 7^. We will omit the binning 
function for succinctness, P(X^, . . . , Xt-k)- We can 

(t—k) 

write this even more compactly by defining X^ = 
{Xtj . . . , Xt-k}' 

The dynamics of a user may depend on users they 
are linked to in some unknown, arbitrary way. There- 
fore, for two users X and F, with activities recorded by 
Sx^Sy^ we define a joint probability distribution using 
a common set of bins denoted with widths (5o, ^i, . . . as 
P(xf Conditional and marginal probability 

distributions are defined in the usual way and we use the 
standard definition for conditional entropy for discrete 
random variables A, B distributed according to P(A, P), 

H{A\B) = - ^ P(A, B) log P{A\B). 

A,B 

B. Definition of transfer entropy 

The transfer entropy introduced in [15 is defined as 

Tx^Y = H{Yt\Yti^^) - HiYtlYti'Kxti^) (2) 

The first term represents our uncertainty about Yt given 
y's history only. The second term represents the smaller 
uncertainty when we know X's history as well. Thus, 
transfer entropy explicitly describes the reduction of un- 
certainty in Yt due to knowledge of X's recent activity. 
Note that information transfer is asymmetric, as opposed 
to mutual information, and thus better suited for char- 
acterizing directed information transfer. For simplicity, 
we take / = k from here on. 



C. Sampling problems and solutions 

The use of information-theoretic techniques to analyze 
real- world point processes has been studied almost exclu- 
sively in the context of neural activity [1 7 . Therefore, it 
is in this literature that the problems associated with 
estimating entropies for sparse point process data have 
been explored most thoroughly. The fundamental prob- 
lem is that, in the absence of sufficient data, estimating 
entropies from probability distributions based on binned 
frequencies leads to systematic bias [13]. Intuitively, if 
we have k bins of history then we need 0(2^) pieces of 
data in order to sample all possible histories. 



A variety of remedies are available and we make use 
of several. The most obvious solution is to restrict our- 
selves to situations where we have adequate data. In the 
subsequent analysis, we filter out users that are below a 
certain activity level. In practice, however, raising our 
activity threshold high enough to guarantee convergence 
of entropies would eliminate almost all users from our 
dataset. 

The next remedy to apply is to estimate the average 
magnitude of the systematic bias that results from using 
sparse data and subtract it from our estimate. When 
we calculate the entropies in Eq. [2j we subtract out the 
Panzer i- Treves bias estimate [12 . Fig. |2] illustrates the 
effect of this bias correction as a function of amount of 
data collected. 

The definition in Eq. [2] implicitly depends on bin 
widths specified by the Si^s. The simplest procedure, and 
the one taken in the neural spike train literature, is to set 
all the bins to have equal width. We have a great deal of 
pre-existing empirical knowledge about human activity 
that can help us improve on this method. Many studies 
have shown that humans exhibit a heavy tail in the dis- 
tribution of their response times to communications [2]. 
This implies that bins accounting for recent activity 
should be narrower while bins accounting for older ac- 
tivity can be wider. We can even base these bin widths 
on measured response times, if such data is available. Us- 
ing more informative bins means we can use fewer bins, 
reducing the effect of sampling problems. 

A final technique to reduce bias is discussed in [17] 
and uses a class of binless entropy estimators. These 
techniques carry their own mathematical difficulties and 
we will not consider them here. With these tools in hand, 
we can proceed to use information transfer to analyze 
user activity in social media. 



III. RESULTS 

In this section we report the results of our experiments 
with both synthetic data and real world data from Twit- 
ter. The ultimate goal is to infer information transfer 
between agents in the network by analyzing their pat- 
terns of activity. Patterns of activity could include many 
things including timing, content, and medium of mes- 
sages. We focus only on the timing of activity on Twitter 
(tweeting of URLs). In principle, our analysis could be 
extended to include more complex information, but, as 
discussed, this would require either more data or better 
methods for dealing with sparse data. 

We test and validate our ability to infer information 
transfer from patterns of activity in two ways. First, 
while our information-theoretic analysis of social net- 
work data uses only timing of activity, the data includes 
unique identifiers allowing us to track the flow of informa- 
tion through the network. On Twitter, we track specific 
URLs. We can use the spread of these trackable pieces 
of information to confirm that the information transfer 
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inferred solely from the timing of activity corresponds to 
actual exchanges of information. 

For the synthetic data, we dictate that an agent's ac- 
tivity depends on its neighbors' activity in some fixed 
way. This allows us to check how well information trans- 
fer recovers the hidden dependence structure from activ- 
ity patterns alone. For instance, even without knowing 
anything about the network structure, we find that a 
sufficient amount of data allows perfect reconstruction of 
the underlying network. 



A. Experiments with synthetic data 

To form a better understanding of different factors im- 
pacting information transfer, we performed extensive ex- 
periments with synthetically generated data. Ideally, we 
would like our synthetic data to reffect, in a tunable way, 
the challenges we face with real world data. These chal- 
lenges include a long tail for human response times, het- 
erogeneous response to neighbors' activity, background 
noise affecting node dynamics, incorrect data, and insuf- 
ficient data. We explore these challenges first for a pair 
of nodes, and then for an entire network. 

We model user activity as a coupled, non-homogeneous 
Poisson point process. Suppose that we have two nodes 
and a single link from X . We can characterize F's 
activity in terms of a time-dependent rate. We define 
Sx = Sx n [0,t), that is, the activity for X until time t. 



Ay(t|^^)=/i + 7 g{t-U) 



(3) 



The first term, /i, represents a constant rate of back- 
ground activity. The second term represents a time- 
dependent increase in the rate of activity in response 
to activity from a neighbor. The strength of infiuence 
of X is parametrized by 7. In practice, we will set the 
background rate equal to a constant and vary the rela- 
tive strength 7//i through the parameter 7. The time 
dependence of the infiuence is captured by the function 

g. We set g{At) = min(l, (^^-^^) ) to refiect the ob- 
served fact that the distribution of human response times 
are characterized by a long tail[2|. 

Along with a causal network, Eq. |3] defines a genera- 
tive model for point process activity. We can efficiently 
generate activity according to this model using the thin- 
ning method discussed in [lOj. We vary the total amount 
of data by fixing the background rate ja = 1 event/day 
and varying the total amount of observation time, T. 
Equivalently, we could have fixed T and varied the rate 
of activity. After fixing the parameters, we can gener- 
ate data and then use that data to infer the appropriate 
probabilities to calculate information transfer according 
to Eq. [2| 

As discussed in Sec. |II C[ we take a variety of measures 
to ensure good estimation. In this case, we directly con- 
trol the amount of data through the parameter T. For 



the bin widths we choose ^0 = 1 sec, fixing the finest 
temporal resolution. For the history we choose wider bin 
widths for less recent history. In the synthetic examples 
we take the past three hours of history into account by 
choosing Si = 1 hour, S2 = 2 hours. Also, it should be 
assumed that the Panzeri- Treves bias estimate has been 




taken into account, except in Fig. 2(a) where we compare 
results without bias correction. 
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FIG. 1: If we have influence from X ^ Y but not vice versa, 
the asymmetry in the information transfer correctly reflects 
the direction of influence. Information transfer plotted for a 
single pair of users. 

Note that in the example in Eq. [3j we have allowed X 
to affect y, but not vice versa. As a first test we can 
generate some data for a pair of users and then compare 
Tx^Y and Ty^x- In Fig-[lJ we compare these two quan- 
tities when 7//i = 2 as a function of the total observation 
time T. 

In Fig. [2] we examine the accuracy and convergence of 
information transfer estimates as a function of time both 
with and without bias correction. We ran 200 trials and 
plot the mean and standard deviation of the information 
transfer estimate at each time step. Clearly, there is a 
systematically high estimate in the low sampling regime, 
but, even in that case, higher influence leads to a higher 
information transfer on average. The Panzeri- Treves bias 
correction drastically reduces, but does not completely 
eliminate, this systematic error. 

Next, we consider the same scenario, where we gen- 
erate X, Y according some stochastic process, but now 
imagine that we do not see all activity. That is, what 
if we do not see every event due to limited sampling? 
This is often the case, for instance, with Twitter data, 
where researchers typically have access to only a small 
fraction of all tweets, ranging from 1% — 20%. So we set 
a sampling parameter /, and say that for each U e Sx, 
we only keep that event with probability /. A summary 
of how the final transfer entropy, Ty^x, depends on the 
sampling rate, /, is given in Fig. [3) We show the results 
after 500 days to guarantee enough data to be very close 
to convergence. We see that sampling drastically reduces 
the inferred transfer entropy, destroying our ability to de- 
duce fiow of information. 

So far, we have only considered two nodes with a single 
link between them. Now, we want to consider a directed, 
causal network of N nodes, with some arbitrary connec- 
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FIG. 2: Mean and std for the estimate of information transfer 
averaging over 200 pairs of users with 7/// = 0,2 as a function 
of time, (a) Results without correcting for bias and (b) with 
Panzer i- Treves bias correction [12]. 
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FIG. 3: A summary of the mean and std of the inferred value 
of Ty^x averaged over 200 trials as a function of the sampling 
rate, with T = 500 days and = 2. 



tivity pattern. We consider a similar stochastic model 
as defined in Eq. [3j except now we denote the set of F' s 
neighbors (i.e., people who can influence Y) as JV{Y). 



FIG. 4: Each row represents a different user. Each line repre- 
sents an event for that user over a time period of thirty days. 
With enough data we could calculate the information transfer 
between each pair of users and recover the unknown network 
structure exactly. 



pick some threshold Tq, and if Tx^y > ^o, we con- 
sider there to be an edge from X ^ otherwise not. 
We could check our true positive rate and f alse p osi- 
tive rate as a function of Tq, as shown in Fig. |5(a)[ for 
AT = 20, 7//i = 1.0 and time = 450 days. We show an ex- 
ample of the recovered versus actual network in Fig. |5(b)' , 
using a threshold picked according to F- measure. 

The previous example was chosen to show what kinds 
of errors arise given a weak signal. In general, with either 
enough data or strong enough influence, we can perfectly 
recover the underlying graph structure. If we consider the 
area under the ROC curve (AUG), as in Fig. |5(a)[ then 
an AUG of 1 corresponds to perfect reconstruction of the 
graph. We summarize the AUGs for random networks 
with = 20 and (k) = 3, while varying T and j/ja in 
Fig.[6l 

As a final experiment, we can consider the effect of al- 
lowing different 7 between different pairs of nodes. Fig.[7| 
shows that transfer entropy is able to recover the relative 
influence well. 

In principle, there are many other effects we could 
have considered to make a more realistic synthetic model. 
Background and influence rates should vary for different 
individuals. There may be periodicity defined by daily, 
weekly, and monthly cycles. However, because informa- 
tion transfer makes no model assumptions, it is relatively 
insensitive to such details. The main constraint is data, 
which is why we focused on sensitivity to amount and 
quality of observations. 



^Y{t\Sj\r{Y)) 



(4) 



To begin we imagine = 7 for all neighbors, but in 
general a node may be affected more strongly by some 
neighbors than others. A sample of activity generated 
according to this model is given in Fig. [4] 

The challenge is to take the information given by the 
activity and recover the underlying graph structure. For 
each pair of nodes, X, F , we calculate Tx^y- Then we 



B. Results for Twitter dataset 

Twitter is a popular micro-blogging service. As of July 
2011, users send 200 million tweets per day. Twitter has 
become an important tool for researchers both due to 
the volume of activity and because of the easily available 
tools for data collection. Twitter's "Gardenhose" API, 
allows access to 20% — 30% of all tweets. 



Unfortunately, as discussed in Sec. Ill A filtering of 
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FIG. 7: Information transfer between pairs of nodes for vary- 
ing 7//i with T = 500 days. The black Une corresponds to the 
mean information transfer for a given 7/// and the shaded re- 
gion denotes the standard deviation after 100 trials. 
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FIG. 5: (a) ROC curve and (b) transfer-entropy induced 
graph for the synthetically generated data described in the 
text. Threshold is chosen according to F-measure. Black 
solid lines correspond to true positives, red dashed lines to 
false positives and blue dotted lines to false negatives. 




FIG. 6: AUC of the network inferred using transfer entropy 
as a function of T, with 7/// = 2, 4. 



data can lead to a drastic reduction in the measured in- 
formation transfer. Instead, the Gardenhose API was 
used to identify URLs being tweeted. Then, the search 
API was used to find all mentions of these URLs in any 
tweets by any users. In this way, the filtering limitation 
is avoided, while we restrict ourselves to the domain of 
URL posting. Additionally, each URL corresponds to a 
unique piece of information whose movement through the 
network can be traced. The data also includes the full 
social network among "active users" , in this case, anyone 



who tweeted a URL in the three week collection period. 
The data we used was collected in the fall of 2010 [6]. 
The dataset included about 70 thousand distinct URLs, 
3.5 million tweets, and 800 thousand users. We further 
filtered our results to "very active" users, namely, users 
who tweeted at least 10 URLs during this time period. 

Before we can calculate transfer entropy as presented 
in Eq. [2j we need to specify the relevant bin widths. We 
take the finest resolution to be Jo = 1 second, the same 
resolution as presented by the Twitter API. For binning 
of the history, we used distribution of observed re-tweet 
response times to motivate a choice of 5i = 10 min, 82 = 
2 hours, = 24 hours. Although we saw a long tail of 
re-tweet times stretching into days, our data were insuffi- 
cient to include this weak effect. By limiting ourselves to 
only three bins, we only have to sample over 8 possible 
histories. Note that the activity is for any tweeting of 
URLs; our calculations do not make use of the informa- 
tion encoded in the URL. We then calculate the transfer 
entropy between each pair of users who are connected. 

The result of this procedure is the construction of a 
directed, weighted graph, where each edge in the original 
directed graph is now labeled by the calculated transfer 
entropy. We can now compare standard measures of in- 
fluence to measures based on this weighted graph. The 
simplest measure of influence on static graphs is to count 
the number of followers a user has. This ignores the fact 
that not all followers are the same, nor do followers re- 
act in the same way to different people that they follow. 
For instance, it may be that a recommendation from a 
close friend is worth more to a person than the same 
recommendation from five acquaintances. This problem 
is only exacerbated by the recent emergence of "follow- 
ers for pay" services, which seek to artificially inflate the 
number of followers to your Twitter account. In Fig. [Sj 
we explore the comparison between out degree and trans- 
fer entropy and we find that although on average people 
with more followers have more transfer entropy, two peo- 
ple with the same number of followers may have vastly 
different influence as measured by transfer entropy. 

To verify that transfer entropy is a meaningful quan- 
tity, we could test how well the transfer entropy, based 
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duplicate accounts. Many of the accounts on this Ust 
have since been banned by Twitter. 
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FIG. 8: For each user, we compare the number of their fol- 
lowers to their cumulative outgoing transfer entropy. Note 
that the outgoing transfer entropy may differ by an order of 
magnitude for people with the same number of followers. 



only on the timing of activity, matches the measured flow 
of information, as determined by tracing specific URLs. 
To that end, for each pair of connected users, X ^ y, 
we count how many specific URLs were first tweeted by 
X and then subsequently re- tweeted by Y. This num- 
ber is compared to the transfer entropy in Fig. [9| The 
existence of even a weak correlation is surprising consid- 
ering the limited amount of data and the fact the transfer 
entropy is not making use of URL or re-tweet informa- 
tion at all. We also note that while a high number of 
re-tweets implies high information transfer, a low num- 
ber of re- tweets is uncorrelated with information transfer. 
This makes sense because information transfer measures 
influence that is not necessarily in the form of re- tweets; 
we will give some examples below. 



I 2x10-^ 

2x10-* 
iKlO-* 



Direa rctwccts 



FIG. 9: The number of URLs that were first tweeted by user 
X and subsequently tweeted by X's follower, Y, is correlated 
with the calculated transfer entropy between X and Y, even 
though transfer entropy is calculated only from the timing of 
activity, without regard for specific URLs. Pearson's correla- 
tion coefficient is 0.22. 

Table |T] shows the edges with the highest information 
transfer. These accounts are all solely for the purpose of 
promotion. Taking the top example, for instance, reveals 
that these two accounts will tweet exactly the same mes- 
sage within a few seconds of each other. Note that in 
the text of their tweets neither account uses re-tweets or 
an "@" for attribution. Twitter specifically forbids indis- 
criminate automatic re-tweets and has a policy against 
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TABLE I: List of edges with highest information transfer. All 
are promotion accounts and many of the accounts have been 
banned since the data were collected. 

To see more complex examples, we restrict ourselves 
to the top 1000 edges according to information trans- 
fer. Then we look at the largest connected compo- 
nents. The largest component involved 600 users in 
Brazil, most of whom had multiple tweets of the form 
"BOMBE O SEU TWITTER, COM MILHARES DE 
NOVOS FOLLOWERS, ATRAVES DO SITE: http://? 
#QueroSeguidores", where "?" was a frequently chang- 
ing URL. Google translates this as "Pump up your Twit- 
ter, get thousands of new followers, link to this site: 
http://? #1 Want Followers." Clicking on some of these 
links suggests that this a "followback" service. You agree 
to follow previous users who have signed up and in return 
other users of the service follow your account. It also ap- 
pears from the text that you are required to re-tweet the 
link to get your followers. Some other examples of high 



information transfer clusters are shown in Fig. 10 

We consider another advantage of measuring influence 
through information transfer by looking at two users who 
had almost the same outgoing transfer entropy (~ 0.025, 
in the top 20 for individuals in our dataset), but vastly 
different behavior of followers. The first Twitter ac- 
count is SouljaBoy, a prominent American rapper who 
is also very active in social media. The second account 
is "silva_marina" , the Twitter account of Marina Silva, 
a popular Brazilian politician. This data was taken dur- 
ing the run up to the Brazilian presidential election, in 
which Marina Silva was a candidate; she received 19.4% 
of the popular vote. At first it seems surprising that 
the SouljaBoy, who has six times the followers, should 
have a similar outgoing transfer entropy to a politician 
known mostly in one country and with fewer than a mil- 
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FIG. 10: (a) This cluster appears to be non- automated, and 
revolves around fandom of singer Justin Bieber. (b) The clus- 
ter of drug spam accounts, (c) An account which aggregates 
soccer news by following and re-tweeting different regional 
soccer accounts. 
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lion Twitter followers. On the other hand, Fig. 
veals the reason for this disparity. Marina Silva may 
have fewer followers, but her effect on them tended to 
be much stronger. Marina Silva's activity tended to be 
a better predictor of her followers' behavior than Soulja 
Boy's activity was for his followers. 




FIG. 11: A histogram showing the probability distribution of 
outgoing transfer entropy for followers of two different Twitter 
accounts. 



The strength of Marina Silva's influence along with the 
serendipitous timing before the Brazilian elections sug- 
gests another intriguing possibility. It seems likely that 
not only does transfer entropy vary for different follow- 
ers, it may vary over time as well. This suggests that 
a dynamic estimate of information transfer could detect 
changes in the importance of individuals in the network. 



IV. DISCUSSION 

We have presented a novel information-theoretic ap- 
proach for measuring influence. In contrast to previous 
studied that focused on aggregate measures of influence, 
the transfer entropy used here allows us to characterize 
and quantify the causal information flow for any pair of 
users. For a small number of users, this can allow us to 
reconstruct the network of connections from user activity 
alone. For large networks, this allows us to identify the 
most important links in the network. 

The method used here for calculating information 
transfer did not require any explicit causal knowledge 
in the form of re-tweets or other textual information. On 
the one hand, this may be an advantage in situations 
where such information is either missing or misleading, 
as was the case in the example for marketers on Twit- 
ter. On the other hand, we may be neglecting valuable 
information, and in the future we would like to incorpo- 
rate textual information in more sophisticated ways but 
still within an information-theoretic approach. Although 
this should be straightforward in principle, in practice 
entropy based approaches require large amounts of data. 
More complex signals require a commensurate increase 
in data. Therefore, the other main thrust of future work 
should be towards reducing data required for entropy es- 
timation, either through better bias correction or through 
binless approaches [17]. 

Because this measure has a rigorous interpretation in 
terms of predictability, it allows us to easily understand 
results that might otherwise seem anomalous. For in- 
stance, in one example we found that Marina Silva, the 
Brazilian presidential candidate, had high information 
transfer both to and from a Brazilian news service. Nei- 
ther Twitter account ever retweeted or explicitly men- 
tioned a tweet of the other. However, there was an ex- 
ternal cause, the upcoming debates and elections, that 
explains both of their activities. Without knowing this 
external cause, it is entirely consistent to say that either 
user's activity could help you predict the others. In fact, 
it may be possible to use this bi-directional predictability 
to identify external causes in the first place. 

Another result that is easy to understand in the con- 
text of predictability is the high incidence of "spam" in 
our results. This is no surprise since a large amount of 
spam is produced by automated systems and these sys- 
tems are intrinsically very predictable. Although identi- 
fying spam is a natural application of our analysis, some 
human behavior stood out as well. Diehard fandom also 
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leads to quite predictable behavior. 

Many existing notions of influence are static, ill- 
defined, ad hoc, or only apply in aggregate. Information 
transfer is a rigorously defined, dynamic measure capable 
of capturing fine-grain notions of influence and admitting 
a straightforward predictive interpretation. Many of the 
mathematical techniques necessary have already been de- 
veloped in the neuroscience literature and we have shown 
how to usefully adapt them to a social media context. 
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